55 research outputs found

    Méthodes d'apprentissage appliquées aux heuristiques de recherche pour les problèmes de satisfaction de contraintes

    Get PDF
    Résumé Motivation : La programmation par contraintes (PPC) propose un cadre formel pour représenter et résoudre des problèmes combinatoires concrets tels que la conception d'horaires de personnel d'hôpital ou l'allocation des portes d'embarquement dans un aéroport. Un problème de satisfaction de contraintes se modélise à l'aide de relations logiques, ou contraintes, entre des variables. La résolution du problème revient alors à identifier une solution qui respecte ces contraintes. Même si l'utilisation de techniques d'inférence puissantes, comme les algorithmes de filtrage, permettent de réduire sensiblement l'espace des solutions, cela reste insuffisant. Il convient alors de guider la recherche à l'aide d'heuristiques. Enjeux et Contexte théorique : Cambazard et Jussien (2005) justifient l'intérêt porté aux heuristiques de recherche en les qualifiant de « Saint Graal » à la fois des communautés de recherche opérationnelle (RO) et de programmation par contraintes (PPC). Parmi les meilleures heuristiques actuelles figurent Impact-Based Search (IBS – Refalo (2004)), qui utilise la réduction de domaine après branchement, et maxSD (Zanarini et Pesant, 2007}), qui utilise le dénombrement des solutions des contraintes. D'autres heuristiques se basent quant à elles sur une estimation de la distribution de solutions, comme les méthodes Belief-Propagation (BP – Kschischang et al.(2001}) et Survey-Propagation (SP – Mezard et al. 2002). Ces méthodes, dites d'inférence, se sont notamment distinguées pour la résolution de problèmes de satisfaction booléenne. Récemment, une variante de ces deux méthodes, dénommée Expectation-Maximization Belief-Propagation, a été proposée par Hsu et al. (2007). Enfin, ces méthodes d'inférence permettent également de déterminer certaines caractéristiques de la structure du problème que l'on cherche à résoudre. À titre d'exemple, de telles estimations permettent ainsi d'identifier les variables dites backbones (Kilby et al. (2005)), qui sont les variables qui prennent toujours la même valeur quelle que soit la solution. Ainsi, une estimation de la distribution des solutions ne fournit pas seulement de l'information utile pour une heuristique, mais également de l'information sur la structure sous-tendante du problème. ----------Abstract Motivation Constraint Programming (CP) is a paradigm to model and solve pratical combinatorial problems, such as nurse scheduling or airport gate assignment. CP models such problems as Constraint Satisfaction Problems (CSPs), i.e. a set of relations between variables. Solving a CSP then becomes equivalent to finding a solution that satisfies all relations. Although strong inference techniques such as filtering algorithms lead to a much smaller solution space, the solution space typically remains too large to be explored exhaustively. This is where search heuristics come in and guide the search toward promising areas. Significance and Theoritical Background Cambazard and Jussien (2005) emphasize the importance of search heuristics when they refer to them as the Holy Grail of both Operations Research (OR) and Constraint Programming (CP) communities. Two of the best current search heuristics are Impact-Based Search (IBS - Refalo (2004)) which exploits domain reduction, and Zanarini and Pesant (2007), which exploits constraint-based solution counting. Other heuristics are designed to estimate the distribution of solutions, such as Belief-Propagation (BP - Kschischang et al. (2001)) and Survey-Propagation (SP - Mezard et al. (2002)). These inference methods have been particularly suitable to Satisfiability (SAT) problems. Recently, Hsu et al. proposed Expectation-Maximization Belief-Propagation (EMBP), a variation of these two methods. In addition, these inference methods also provide information about the underlying structure of the problem. For example, such estimates enable to detect backbone variables (Kilby et al. (2005), i.e. that always take the same value regardless of the solution. As a result, these estimates not only direct the search but also provide structural information about the solution space

    Modular Transformers: Compressing Transformers into Modularized Layers for Flexible Efficient Inference

    Full text link
    Pre-trained Transformer models like T5 and BART have advanced the state of the art on a wide range of text generation tasks. Compressing these models into smaller ones has become critically important for practical use. Common neural network compression techniques such as knowledge distillation or quantization are limited to static compression where the compression ratio is fixed. In this paper, we introduce Modular Transformers, a modularized encoder-decoder framework for flexible sequence-to-sequence model compression. Modular Transformers train modularized layers that have the same function of two or more consecutive layers in the original model via module replacing and knowledge distillation. After training, the modularized layers can be flexibly assembled into sequence-to-sequence models that meet different performance-efficiency trade-offs. Experimental results show that after a single training phase, by simply varying the assembling strategy, Modular Transformers can achieve flexible compression ratios from 1.1x to 6x with little to moderate relative performance drop.Comment: ACL 2023 Finding

    Scruples: A Corpus of Community Ethical Judgments on 32,000 Real-Life Anecdotes

    Full text link
    As AI systems become an increasing part of people's everyday lives, it becomes ever more important that they understand people's ethical norms. Motivated by descriptive ethics, a field of study that focuses on people's descriptive judgments rather than theoretical prescriptions on morality, we investigate a novel, data-driven approach to machine ethics. We introduce Scruples, the first large-scale dataset with 625,000 ethical judgments over 32,000 real-life anecdotes. Each anecdote recounts a complex ethical situation, often posing moral dilemmas, paired with a distribution of judgments contributed by the community members. Our dataset presents a major challenge to state-of-the-art neural language models, leaving significant room for improvement. However, when presented with simplified moral situations, the results are considerably more promising, suggesting that neural models can effectively learn simpler ethical building blocks. A key take-away of our empirical analysis is that norms are not always clean-cut; many situations are naturally divisive. We present a new method to estimate the best possible performance on such tasks with inherently diverse label distributions, and explore likelihood functions that separate intrinsic from model uncertainty.Comment: 18 pages, 14 tables, 18 figures. Accepted to AAAI 2021. For associated code and data, see https://github.com/allenai/scruple

    Dynamic Neuro-Symbolic Knowledge Graph Construction for Zero-shot Commonsense Question Answering

    Full text link
    Understanding narratives requires reasoning about implicit world knowledge related to the causes, effects, and states of situations described in text. At the core of this challenge is how to access contextually relevant knowledge on demand and reason over it. In this paper, we present initial studies toward zero-shot commonsense question answering by formulating the task as inference over dynamically generated commonsense knowledge graphs. In contrast to previous studies for knowledge integration that rely on retrieval of existing knowledge from static knowledge graphs, our study requires commonsense knowledge integration where contextually relevant knowledge is often not present in existing knowledge bases. Therefore, we present a novel approach that generates contextually-relevant symbolic knowledge structures on demand using generative neural commonsense knowledge models. Empirical results on two datasets demonstrate the efficacy of our neuro-symbolic approach for dynamically constructing knowledge graphs for reasoning. Our approach achieves significant performance boosts over pretrained language models and vanilla knowledge models, all while providing interpretable reasoning paths for its predictions

    Commonsense Knowledge Transfer for Pre-trained Language Models

    Full text link
    Despite serving as the foundation models for a wide range of NLP benchmarks, pre-trained language models have shown limited capabilities of acquiring implicit commonsense knowledge from self-supervision alone, compared to learning linguistic and factual knowledge that appear more explicitly in the surface patterns in text. In this work, we introduce commonsense knowledge transfer, a framework to transfer the commonsense knowledge stored in a neural commonsense knowledge model to a general-purpose pre-trained language model. It first exploits general texts to form queries for extracting commonsense knowledge from the neural commonsense knowledge model and then refines the language model with two self-supervised objectives: commonsense mask infilling and commonsense relation prediction, which align human language with the underlying commonsense knowledge. Empirical results show that our approach consistently improves the model's performance on downstream tasks that require commonsense reasoning. Moreover, we find that the improvement is more significant in the few-shot setting. This suggests that our approach helps language models better transfer to downstream tasks without extensive supervision by injecting commonsense knowledge into their parameters.Comment: ACL 2023 Finding

    Polynomial Time Construction for Spatially Balanced Latin Squares

    Full text link
    In this paper we propose a construction that generates spatially balanced Latin squares (SBLSs) in polynomial time. These structures are central to the design of agronomic experiments, as they avoid biases that are otherwise unintentionally introduced due to spatial auto-correlation. Previous approaches were able to generate SBLSs of order up to 35 and required about two weeks of computation. Our algorithm runs in O(n2) and generates SBLSs of arbitrary order n where 2n + 1 is prime. For example, this algorithm generates a SBLS of order 999 in a fraction of a second.National Science Foundation (NSF Expeditions in Computing award for Computational Sustainability, grant 0832782; NSF IIS award, grant 0514429), Intelligent Information Systems Institute, Cornell University (Air Force O ce of Scienti c Research, AFOSR, grant FA9550-04-1-0151), Natural Sciences and Engineering Research Council of Canada (NSERC

    From Dogwhistles to Bullhorns: Unveiling Coded Rhetoric with Language Models

    Full text link
    Dogwhistles are coded expressions that simultaneously convey one meaning to a broad audience and a second one, often hateful or provocative, to a narrow in-group; they are deployed to evade both political repercussions and algorithmic content moderation. For example, in the sentence 'we need to end the cosmopolitan experiment,' the word 'cosmopolitan' likely means 'worldly' to many, but secretly means 'Jewish' to a select few. We present the first large-scale computational investigation of dogwhistles. We develop a typology of dogwhistles, curate the largest-to-date glossary of over 300 dogwhistles with rich contextual information and examples, and analyze their usage in historical U.S. politicians' speeches. We then assess whether a large language model (GPT-3) can identify dogwhistles and their meanings, and find that GPT-3's performance varies widely across types of dogwhistles and targeted groups. Finally, we show that harmful content containing dogwhistles avoids toxicity detection, highlighting online risks of such coded language. This work sheds light on the theoretical and applied importance of dogwhistles in both NLP and computational social science, and provides resources for future research in modeling dogwhistles and mitigating their online harms.Comment: ACL 2023, see https://dogwhistles.allen.ai/ for the glossary and other material
    • …
    corecore